# Training steps
`cd syn_data_privacy`, then: 

### 1. Extract CLIP Embedding for CIFAR-10
Please specify `$DATA_PATH` and `$EMB_PATH` in `src/diffusion/scripts/extract.sh`. Then run the following command to extract CLIP embeddings for samples in CIFAR-10.

```bash
bash src/diffusion/scripts/extract.sh
```

### 2. Fine-tune the Diffusion Model on CIFAR-10
Please specify `$DATA_PATH`, `$EMB_PATH`, `$LORA_PATH`, and `$LIRA_PATH` in `src/diffusion/scripts/train_lora.sh`. Then run the following command.

```bash
bash src/diffusion/scripts/train_lora.sh $exp_id $num_shadows
```

### 3. Generate Synthetic Dataset
After fune-tuning the model, please check `src/diffusion/scripts/generate.sh` to generate synthetic training data using 8 GPUs in parallel. Before you start, please modify `$DATA_PATH`, `$LORA_PATH`, and `$SYN_PATH`.

```bash
bash src/diffusion/scripts/generate.sh $exp_id $num_shadows
```

For each shadow model, this will generate 15 different versions of the synthetic training dataset to `$SYN_PATH/exp_${exp_id}`. You can adjust `--num_per_image` to specify the number of total versions if needed. Additionally, the script will use 8 gpus by default. If you want to make a change, please modify both the loop and `--nchunks`.

### Train Models Using Synthetic Data
Please specify `$DATA_PATH`, `$LIRA_PATH`, and `$SYN_PATH` in `src/diffusion/scripts/train_shadow.sh`. Then run

```bash
bash src/diffusion/scripts/train_shadow.sh $exp_id $num_shadow
```

All models will be saved to `$LIRA_PATH/ckpts`.
